SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs
نویسندگان
چکیده
Approximate computing provides an opportunity for exploiting application characteristics to improve performance of computing systems. However, such opportunity must be balanced against generality of methods and quality guarantees that the system designer can provide to the application developer. Improved parallel processing in graphics processing units (GPUs) provides one such means for data-level parallel applications. We propose SqueezCL a software method to reduce the hardware resources used by an OpenCL kernel. SqueezCL transforms an exact OpenCL kernel to an approximate OpenCL kernel by squeezing dimensions of its data elements. The core of SqueezCL leverages bitwidth reduction to shrink the hardware resources. Selectively reducing the precision and size of data elements generates approximate kernels that can be executed faster at a cost to quality loss. Exploiting this opportunity is particularly important for GPU accelerators that are inherently subject to memory resource constraints. We evaluate SqueezCL on a diverse set of data-level parallel OpenCL benchmarks from the AMD APP SDK v2.9. Experimental result on the AMD Radeon HD 5870 shows that SqueezCL yields on average 1.1× higher performance with less than 10% quality loss without requiring any changes to the underlying GPU hardware.
منابع مشابه
Iterative statistical kernels on contemporary GPUs
We present a study of three important kernels that occur frequently in iterative statistical applications: Multi-Dimensional Scaling (MDS), PageRank, and K-Means. We implemented each kernel using OpenCL and evaluated their performance on NVIDIA Tesla and NVIDIA Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. By examining ...
متن کاملFrom CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming
In this work, we evaluate OpenCL as a programming tool for developing performanceportable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide ...
متن کاملCooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)
There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today’s GPUs in a manner that does not allow the G...
متن کاملOptimizing OpenCL Kernels for Iterative Statistical Applications on GPUs
We present a study of three important kernels that occur frequently in iterative statistical applications: K-Means, MultiDimensional Scaling (MDS), and PageRank. We implemented each kernel using OpenCL and evaluated their performance on an NVIDIA Tesla GPGPU card. By examining the underlying algorithms and empirically measuring the performance of various components of the kernel we explored the...
متن کاملDirective-Based Compilers for GPUs
General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a conven...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015